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Abstract 

In this paper, we consider Steiner forest and its generalizations, prize- collecting Steiner forest 
and k- Steiner forest, when the vertices of the input graph are points in the Euclidean plane and 
the lengths are Euclidean distances. First, we present a simpler analysis of the polynomial-time 
approximation scheme (PTAS) of Borradaile et al. [12j for the Euclidean Steiner forest problem. 
This is done by proving a new structural property and modifying the dynamic programming 
by adding a new piece of information to each dynamic programming state. Next we develop a 
PTAS for a well- motivated case, i.e., the multiplicative case, of prize-collecting and budgeted 
Steiner forest. The ideas used in the algorithm may have applications in design of a broad class 
of bicriteria PTASs. At the end, we demonstrate why PTASs for these problems can be hard in 
the general Euclidean case (and thus for PTASs we cannot go beyond the multiplicative case). 

1 Introduction 

Prize-collecting Steiner problems are well-known network design problems with several applica- 
tions in expanding telecommunications networks (see e.g. |25} I32j). cost sharing, and Lagrangian 
relaxation techniques (see e.g. [241 115]). The most general version of these problems is called 
the prize- collecting Steiner forest (PCSF) proble in which, given a graph G = (V,E), a set 
of (commodity) pairs V = {(si, it), (s2, £2)5 • • • }> a non-negative cost function c : E — > Q-°, and 
finally a non-negative penalty function tt : T> — » Q-°, our goal is a minimum-cost way of buying 
a set of edges and paying the penalty for those pairs which are not connected via bought edges. 
When all penalties are 00, the problem is the classic APX-hard Steiner forest problem for which 
the best approximation factor is 2 — ^ (n is the number of vertices of the graph) due to Goe- 
mans and Williamson |19j . When all sinks are identical in the PCSF problem, it is the classic 
prize-collecting Steiner tree problem. Bienstock, Goemans, Simchi-Levi, and Williamson [8] first 
considered this problem (based on a problem earlier proposed by Balas [3]) for which they gave a 
3-approximation algorithm. The current best approximation algorithm for this problem is a recent 
1.992-approximation algorithm of Archer, Bateni, Hajiaghayi, and Karloff pQ improving upon a 
primal-dual (2 — — ^r) -approximation algorithm of Goemans and Williamson [TS]. When in ad- 
dition all penalties are 00, the problem is the classic Steiner tree problem, which is known to be 
APX-hard [7] and for which the best known approximation factor is 1.55 |31j. 
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There are several 3-approximation algorithms for the prize- collecting Steiner forest problem us- 
ing LP rounding, primal-dual, or iterative rounding methods which are first initiated by Hajiaghayi 
and Jain [22J (see [H [23] ) . Currently the best approximation factor for this problem is a randomized 
2.54-approximation algorithm [22]. The approach of Hajiaghayi and Jain has been generalized by 
Sharma, Swamy, and Williamson [33] for network design problems where violating arbitrary 0-1 
connectivity constraints are allowed in exchange for a very general penalty function. 

Lots of attention has been paid to budgeted versions of Steiner problems as well. In the k-Steiner 
forest (or just /c-forest for abbreviation), given a graph G = (V,E) and a set of (commodity) pairs 
D, the goal is to find a minimum-cost forest that connects at least k pairs of T>. The best current 
approximation factor for this problem is in 0(mm{y/k, y/n}) [21]. On the other hand, Hajiaghayi 
and Jain [22] could transform notorious dense k-subgraph to this problem, for which the current 
best approximation factor is 0{n l /' i ~ e ) [16] . The special case in which we have a root r and T> 
consists of all pairs (r, v) for v E V(G) — {r} is the well-known NP-hard /c-MST problem. The 
first non-trivial approximation algorithm for the A;-MST problem was given by Ravi et al. [30J, 
who achieved an approximation ratio of 0(Vk). Later this approximation ratio is improved to a 
constant by Blum et al. [9j. Currently the best approximation factor for this problem is 2 due to 
Garg Plj. 

In this paper, we consider Euclidean prize- collecting Steiner forest and Euclidean k-forest in 
which the vertices of the input graph are points in the Euclidean plane (or low-dimensional Eu- 
clidean space) and the lengths are Euclidean distances. For the Euclidean Steiner tree problem, 
Arora [2] and Mitchell [29] gave polynomial-time approximation schemes (PTASs). Recently Bor- 
radaile, Klein and Kenyon-Mathieu [12] claim a PTAS for the more general problem of Euclidean 
Steiner forest . 

1.1 Problem definition 

Motivated by the settings in which the demand of each pair is the product of the weight of the origin 
vertex and the weight of the destination vertex in the pair and thus in a sense contributions of each 
vertex to all adjacent pairs are the same (e.g., see product multi- commodity flow in Leighton and 
Rao [27] or [101 [26] , and its applications in wireless networks [28J or routing [131 E] ) > we consider 
the following multiplicative version of prize-collecting Steiner forest for the Euclidean case. 

In the Multiplicative prize- collecting Steiner forest (MPCSF) problem, given an undirected 
graph G(V, E) with non-negative edge lengths c e for each edge e 6 E, and also given weights 
4>{v) for each vertex v & V, our goal is to find a forest F which minimizes the cost 

edF u,vdV: u and v are not connected via F 

Indeed, this is an instance of PCSF in which each ordered vertex pair (u, v) forms a request with 
penalty <^(m)<^(w)o We may be asked to collect a certain prize S, in which case the goal is to find 
the forest F of minimum cost for which 

^ <t>{u)<f>{v) > S. 

u,v£V: u and v are connected via F 

2 We can change the definition to unordered pairs whose treatment requires only a slight modifications of the 
algorithms. Currently, each unordered pair (u, v) has a prize of 2(j>(u)(j>(v) if u =fc v. 
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Let us call this problem S'-MPCSF. We show that this is a generalization of the fc-MST problem 
(see Appendix IA.2D and thus currently there is no approximation better than 2 for this problem 
either. When working on the Euclidean case, the input does not include any Steiner vertices, as all 
the points of the plane are potential Steiner points. 

A bicriteria (a, /3)-approximate solution for the the 5-MPCSF problem is one whose cost is at 
most aOPT, yet collects a prize of at least (3S. Our main contribution in this paper is a bicriteria 
(1 + e, 1 — e')-approximation algorithm that runs in time exponential in 1/e but polynomial in n 
and 1/e'. We then use this algorithm to obtain a PTAS for MPCSF. 

1.2 Our contribution 

First of all, we present a simpler analysis for the algorithm of Borradaile et al. [12J for the Euclidean 
Steiner forest problem and reprove the following theorem. 

Theorem 1. For any constant e > 0, there is an algorithm that runs in polynomial time and approxi- 
mates the Euclidean Steiner forest problem within 1 + e of the optimal solution. 

This is done by modifying the dynamic programming (DP) algorithm so that instead of storing 
paths enclosing the zones in the algorithm by Borradaile et al., we use a bitmap to identify a 
zone. The modification results in simplification of the structural property required for the proof 
of correctness (See Section [3]). We prove this structural property in Theorem [6j The proof has 
some ideas similar to [12], but we present a simpler charging scheme that has a universal treatment 
throughout. Next we give an overview of the dynamic programming algorithm in Section 01 We 
have recently come to know that similar simplifications have been independently discovered by the 
authors of [12], too. 

Next we extend the algorithm for Euclidean S-MPCSF and MPCSF problems in Section [5j 

Theorem 2. For any e,e' > 0, there is a bicriteria (1 + e, 1 — e')-approximation algorithm for the 
Euclidean S-MPCSF problem, that runs in time polynomial in n, 1/e' and exponential in 1/e. 

Notice that e' need not be a constant. In particular, if all weights are polynomially bounded 
integers, we can find in polynomial time a (1 + e)-approximate solution that collects a prize of at 
least S; this can be done by picking e' to be sufficiently small (e /_1 is still polynomial). Next we 
present a PTAS for Euclidean MPCSF. 

Theorem 3. For any constant e, there is a (1 + e)-approximation algorithm for the Euclidean MPCSF 
problem, that runs in polynomial time. 

We also study the case of asymmetric prizes for vertices in which each vertex v has two types 
of weights (type one and type two) and the prize for an ordered pair (u, v) is the product of the 
first type weight of u, i.e., </> s (u), and the second type weight of v, i.e., 4> l {v). This case is especially 
interesting because it generalizes the multiplicative prize-collecting problem when we have two 
disjoint sets S\ and 52 and we pay the multiplicative penalty only when two vertices, one in S\ and 
the other one in S2, are not connected (by letting for each vertex in S\ the first type weight be its 
actual weight and the second type weight be zero and for each vertex in S2 the first type weight be 
zero and the second type weight be its actual weight.) After hinting on the arising complications, 
we show how we can extend our algorithms for this case as well. 
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Theorem 4. For any e,e' > 0, there is a bicriteria (1 + e, 1 — e')-approximation algorithm for the 
Euclidean Asymmetric S-MPCSF problem, that runs in time polynomial in n,l/e' and exponential 
in 1/e. In addition, for any constant e, there is a (1 + e)-approximation algorithm for the Euclidean 
Asymmetric MPCSF problem, that runs in polynomial time. 

Indeed, the algorithms in Theorem [4] can be extended to the case in which there are a constant 
number of different types of weights for each vertex generalizing the case in which we have a 
constant number of disjoint sets and we pay the multiplicative penalty when two vertices from 
two different sets are not connected. Notice that the case of two disjoint sets already generalizes 
the prize- collecting Steiner tree problem (by considering S\ = {r} and S2 = V — {r}) whose best 
approximation guarantee is currently 1.992. 

At the end, we present in Section [6] why PCSF and /c-forest problems can be APX-hard in the 
general case (and thus for PTASs we cannot go beyond the multiplicative case). We conclude with 
some open problems in Section [7J All the omitted proofs appear in the appendix. 

1.3 Our techniques for the prize-collecting version 

Here, we summarize our techniques for the multiplicative prize collecting Steiner forest algorithms; 
see Section [5j In all those algorithms, we store in each DP state extra parameters, including the 
sum of the weights, as well as the multiplicative prize already collected in each component. These 
parameters enable us to carry out the DP update procedure. Interestingly, the sum and collected 
prize parameters have their own precision units. 

In the asymmetric version, a major issue is that no fixed unit is good for all sum parameters. 
Some may be small, yet have significant effect when multiplied by others. To remedy this, we use 
variable units, reminiscent of the floating-point storage formats (mantissa and exponent). To the 
best of our knowledge, Bateni and Hajiaghayi [4j were the first to take advantage of this idea in the 
context of (polynomial time) approximation schemes. The basic idea is that a certain parameter 
in the description of DP states has a large (not polynomial) range, however, as the value grows, we 
can afford to sacrifice more on the precision. Thus, we store two (polynomial) integer numbers, say 
(i, x), where i denotes a variable unit, and x is the coefficient: the actual number is then recovered 
by x ■ U{. The conversion between these representations is not lossless, but the aggregate error can 
be bounded satisfactorily. 

In Section T5. 31 we consider the problem where the objective is a linear function of penalties paid 
and the cost of the forest built. The challenging case is when the cost of the optimal forest is very 
small compared to the penalties paid. In this case, we identify a set of vertices with large penalties 
and argue they have to be connected in the optimal solution. Then, with a novel trick we show 
how to ignore them in the beginning, and take them into account only after the DP is carried out. 

2 Preliminaries 

Let n = \ V\ be the total number of terminals and let OPT be the total length of the optimal solution. 
A bitmap is a matrix with 0-1 entries. Two bitmaps of the same dimensions are called disjoint if and 
only if they do not have value one at the same entry. Consider two partitions V = {Pi, P2, ■ ■ ■ , P\v\} 
and V' = {P{, P2, . . . , Pypi\\ over the same ground set. Then, V is said to be a refinement of V if 
and only if any set of V is a subset of a set in V' , namely VP G V, 3P' G V' : P C P' . 
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(a) 



(b) 



Figure 1: (a) An example of a dissection square with depth 3, and depiction of portals for a sample 
dissection square with m = 8; |(b)| the 7x7 grid of cells inside a sample dissection square with 
7 = 4. 



By standard perturbation and scaling techniques, we can assume the following conditions hold 
incurring a cost increase of O(eOPT); see [21 [12] for example. 

(I) The diameter of the set V is at most d! = n 2 e _1 OPT. 

(II) All the vertices of V and the Steiner points have coordinates (2i + 1, 2j + 1) where i and j 
are integers. 

For simplicity of exposition, we ignore the above increase in cost. As we are going to obtain 
a PTAS, this increase will be absorbed in the future cost increases. We have a grid consisting of 
vertical and horizontal lines with equations x = 2i and y = 2j where i and j are integers. Let £ 
denote the set of lines in the grid. We let L be the smallest power of two greater than or equal to 



2d' and perform a dissection on the randomly shifted bounding box of size L x L; see Figure 1(a) 

For each dissection square R and each side S of R, designate m + 1 equally spaced points along 
S (including the corners) as portals of R where m is the smallest power of 2 greater than 4e _1 log L. 
So the square R has 4m portals. 

There is a notion of level associated with each dissection square, line, or side of a square. The 
bounding box has level zero, and level of each other dissection square is one more than the level of 
its parent dissection square. The level of a line t is the minimum level of a square R a side of which 
falls on the line I. Thus, the first two lines dividing the bounding box have level one. If a side S of 
a square R falls on a line £, we define level (5) = level(^). So level(S) < level(i?). The thickness of 
the lines in Figure [1] denotes their level: the thicker the line, the lower is its level. 

For a (possibly infinite) set of geometric points X, let comp(X) denote the number of connected 
components of X; we will use the shorthand "component" in this paper. With slight abuse of 
notation, I G £ is used to refer to the set of pointel on I. In addition, we use C to denote the 
union of points on the lines in C Similarly, we use R to denote the set of all points on or inside 
the square R. The set of points on (the boundary of) the square R is referred to by dR. The total 
length of all line segments in F is denoted by length (F). 

The following theorem is mentioned in [12] in a stronger form. We only need its first half whose 
proof follows from [2]. 



3 not necessarily terminals 
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Theorem 5. [12] There is a solution F having expected length at most (1 + |e)OPT such that each 
dissection square R satisfies the following two properties: for each side S of R, F n S has at most 
p = 0(e _1 ) non-corner component^ (boundary components property); and each component of FndR 
contains a portal of R (portal property). 



3 Structural theorem 

Let R be a dissection square. Divide R into a regular 7x7 grid of cells, where 7 is a constant 



power of two determined later; see Figure 1(b) We say R is the owner of these cells. The level 
of these cells, as well as the new lines they introduce, is defined in accordance with the dissection. 
That is, we assign them levels as if they are normal dissection squares and we have continued 
the dissection procedure for log 7 more levels. There are several lemmas in the work of [12j to 
prove the structural property they require (this is the main contribution of that work) . We modify 
the dynamic programming definition such that its proof of correctness needs a simpler structural 
property. The proof of this property is simpler than that in the aforementioned paper. 

Theorem 6. There is a solution F having expected length at most (1 + ^e)OPT such that each 
dissection square R satisfies the locality property: if the terminals t% and are inside a cell C of R and 
are connected to dR via F, then they are connected in F Pi R. 

The proof has ideas similar to |X2(, Theorem 3.2, and Lemmas 3.3, 3.4, 3.5 and 3.9]. We first 
mention and prove a lemma we need in order to prove Theorem [6l The lemma more or less appears 
in [21 [12]. 

Lemma 7. For the forest F output by Theorem EI comp(F n C) < length(F). 

We can now prove the main structural result. A side S of a square R is called private if it does 
not lie on a side of the parent square R! of R. Observe that out of any two opposite sides of a 
dissection square, exactly one is private. 

Proof of Theorem [6l We start with a solution F satisfying Theorem [5j The final solution is 
produced by iteratively finding the smallest cell C owned by a square R that violates the locality 
property, and adding o~(C, F) to F, where cr(C, F) is defined as the union of the private sides of C 
and any side of C having non-empty intersection with F. We claim the locality property is realized 
after finitely many such additions. If after adding o~(C, F) to F, the cell C still violates the locality 
property, there has to be exactly two opposite sides of the cell having non-empty intersection with 
F; otherwise, the o~(C, F) is clearly connected. However, in case of the opposite sides, one middle 
side will be a private side of C and hence included as well. 

Next, we argue that the conditions of Theorem [5] still hold. Take a side S of any square R. If 
the conditions are to be affected for S, it has to be due to an addition involving a cell C that has 
a side S' such that (1) S' has non-empty intersection with S, and (2) S' is added to F as part of 
cr(C, F). The condition will be trivial if S' contains S. Thus, we assume that C is a smaller square 
than R. So S' cannot be a private side of C. However, the number of components on S cannot 
increase if S' has already an intersection with F. 



4 Non-corner components are those not including any corners of squares. Note that each square can have at most 
four corner components. 
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Finally we show that the additional length is not large. Let F* = FriC, and let Q = {(x, y) : x = 
2i,y = 2j} be the set of all grid points. We will charge the additions to the connected components 
of F* — Q. Notice that 



comp(F* -Q)< comp(F*) + 3|F* n Q 



(1) 
(2) 



< comp(F*) + 3 • (length(F*) + comp(F*)) 
= 4comp(F*) + 31ength(F) 

< 7 length (F), 



by Lemma [71 



(3) 



Inequality (TjQ) holds because removal of each grid point on F* increases the number of components 
by at most three. To obtain ([2]), notice that in any connected component of F* , the distance 
between any two points of F* n Q is at least 2. Hence, if there are more than one such points, there 
cannot be more than length(F*) ones. 

We charge this addition to a connected component of (dR n F) — Q, in such a way that each 
connected component is charged to at most twice: once from each side. For simplicity, we duplicate 
each connected component of (Ff]£) — Q: they correspond to squares from either side of I. For any 
dissection square R, let Cr refer to the connected components of Fn R that reach dR. Further, let 
K,r be the set of connected components of (F n dR) — Q. When a(C, F) is added where R is the 
owner of C, there are k > 2 components ci, . . . ,c& G Cr that become connected. Any element of 
K r connected via F n R to a component c G Cr is said to be an interface of c. The addition will 
be charged to a free interface of some c G Cr with maximum level. This element will no longer be 
free for the rest of the procedure. We argue this procedure successfully charges all the additions 
to appropriate border components. To this end, we shortly prove the following stronger claim via 
induction on the number of additions performed. We call a dissection square R violated if the 
locality property does not hold for a cell C owned by R. 

Claim 8. At all times during the execution of this procedure, any component c G Cr has a free 
interface, for each violated square R. As a result, any addition can be charged to a free component. 

The second statement of the claim follows from the first part. The first part is proved as follows. 
The claim clearly holds at the beginning, since all interfaces are free, and each component has an 
interface. Suppose the addition o~(C, F) is performed and let R be the owner of C. We show any 
dissection square R' will stay fine. Notice that the size of the squares R for which the addition is 
performed is increasing in time. Hence, any dissection square R' smaller than R is irrelevant in the 
statement of the claim, since they cannot be violated. For R itself, each Cj has at least one free 
interface. One of the interfaces is used, and thus the new component formed by their union has a 
free interface. Suppose for the sake of reaching a contradiction that a component c' G Cr/ has no 
free interface after the addition. Thus R' contains R, and the charging was not done to a private 
side of R. Recall that prior to the addition, d is connected to some components of Cr with at least 
two free interfaces in R. One of them still remains free. We charged to the interface of maximum 
level and it was in dR' . Hence, the free interface is also in dR, leading to a contradiction. 

Let (the random variable) cgj denote the number of charges to components on £ G £ due to cells 
C owned by squares R of level j. Independently of the randomness Y2i J2j c z,j — 2comp(F* — Q) 
by the above discussion and Claim El Note the cost of adding a(C, F) (charged to a component on 
R) is at most 4Z//7 where V is the side length of R. The total increase due to charges to £ is at 
most X^->dcpth(f) c £,j^5J where L is the side length of the bounding box. Due to the randomization 
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in the dissection, we have Pr[depth(^) = i] = 2 l /L; see [2] for instance. The expected increase in 
length is thus 



EE^E^^EEfE* 

£ i j>i 1 £ j i<3 



< 



7 i i 

X6 

< — comp(P* - Q) by Claim E 

112 

< length (F) by®. 

7 

We pick 7 to be the smallest power of two larger than 112(1 + e) • 2e _1 to finish the proof. □ 

Therefore, with probability 1/2, we have length(P) < (1 + e)OPT. In the entire argument, no 
attempt was made to optimize the parameters. 



4 The algorithm 

A subsolution for R is a finite set of line segments F C R satisfying conditions of Theorems [5] and 
[6l with the extra property that any terminal t in R is connected via F either to its mate or to dR. 
A configuration \ = (/C,P) for R has two portions: a set /C of pairs K{ = (Pi, Mi) and a partition 
V whose ground set is /C, such that 

• Pi is a subset of portals of R; 

• Mi is a bitmap of size 7x7; 

• Pi and Pj are disjoint if i 7^ j; 

• the total number of portals, namely Yli \Pi\i ls a ^ most 4(p + 1); and 

• bitmaps M, and Mj are disjoint if i 7^ j. 

The configuration captures sufficient information about F so as to make it possible to take care of 
the interaction between R and the outside. In particular, each pair (P, M) describes a connected 
component of F, by specifying the set of portals on its boundary and the set of cells connected to 
these portals. Roughly speaking, the partition V tells us which components Ki and Kj need to be 
connected from outside R: this implies the existence of a pair of terminals that are in Ki and Kj, 
respectively, but they are not connected in R. We will see below why this restrictive abstraction 
does not lose any crucial subsolutions. 

We say a subsolution F is compatible with a configuration x = if 

1. for any connected component k of F that intersects dR, there exists a pair k' = (P, M) G K, 
such that 

• k spans P; 

• each connected component of k n dR contains a portal of P; 
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• the bitmap M has value one in the positions corresponding to any cell C containing a 
terminal t of k; and 

2. any terminal pair located in different components K\ and K2 of 1C are either connected via 
F n R, or k\ and K2 are in the same set of V . 

4.1 The dynamic programming 

In the dynamic program, we build a table Tr[x], indexed by configurations for each dissection 
square R. The goal is to populate this table so that Tr[x] is the minimum length of a subsolution 
for R that is compatible with \. First of all, we show that for each R, the number of configurations 
is small. Consider \ = QC>'P)- There are at most A = 4(p + 1) pairs in fC. For a particular 
k = (P,M), there are ^f_ ( m+1 ) = 0(m x+1 ) options for the set of portals P. The bitmap M 

2 \2 

has 2 7 possibilities. A crude upper bound of 2 is trivial for possibilities of V. Thus, the total 
number will be at most 

• 2 A " = 0(poly(ro)) = 0(poly log(n)). 

Theorems [5] and [6] guarantee the existence of a near-optimal solution all whose subproblems are 
compatible with a configuration: The connected components of F reaching dR can be decomposed 
into disjoint bitmaps because of Theorem [6l Theorem [5] on the other hand ensures each connected 
component on dR contains a portal, and the total number of such components is small. The details 
of the DP update, as well as its correctness proof, appears below. 

The final solution of the problem is obtained from the minimum Tr[x] where R is the bounding 
box, and V of \ does not require any connections: i.e., all sets of the partition are singletons. This 
would imply all the necessary connections have been made inside R. To actually construct the 
solution, we need to store additional information in each dynamic programming state indicating 
which configurations it was last updated from. It is then straightforward to recursively construct 
the solution, by taking the union of the pertinent configurations. 

Here we show how the dynamic programming table for Euclidean prize-collecting Steiner forest 
is updated from the already-computed values. And finally we show why the update routine is 
sound and complete. The table Tr[x] is populated in the order of increasing size for R. For a 
base dissection square R, finding the value of Tr[x] is straightforward. Notice that there is at most 
one point (possibly with several terminals collocated) inside R. Depending on whether the mates 
of those terminals are collocated with them or not, we may need to connect some of them to the 
boundary dR. There are only a constant number of portals in x, hence we can go over all the ways 
to connect them up and find the smallest value. Note that there cannot be any Steiner point inside 
R. 

Now we get to the update rule. Consider a dissection square R and a corresponding configuration 
X = (K.,V). Let Ri for i = 1,2,3,4, be the children of R in the dissection. Take corresponding 
configurations Xi = Q^iiPi)- Notice that each cell of R consists of exactly four cells of one Ri. We 
can expand a bitmap M of Ri to a bitmap M' of dimensions 27 x 27 for R, by placing three all-zero 
bitmaps of dimensions 7 x 7 at appropriate locations around M. We do this in such a way that the 
portion corresponding to M still points to Ri inside R. Consider all the components k = (P, M) 
corresponding to the four subsquares, expand their bitmap, and collect them in K . Merge the 
partitions V% to get V ' . If there is a terminal pair (s,t) where s is in Ri and t is in a different Rj, 
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Algorithm EuclideanSteinerForest 

Input: Set of terminals V in the plane, and set V of pairs of terminals 
Output: A forest F connecting pairs in V 

1. Carry out the perturbation and scaling. 

2. Let L be smallest power of two larger than 2n 2 e~ 1 d, where d is the maximum distance 
of a pair. 

3. Perform a random dissection in the bounding box of side L. 

4. Place m + 1 portals on each side of a dissection square, where m is the smallest power 
of two larger than 4e _1 log L. 

5. Solve the base cases Tr[x] for leaf dissection squares R: 

Go over all possible ways of connecting the portals and the center point. 

6. Populate the table Tr[x] in increasing order of size for R: 

For any \ = (£> V) corresponding to R consisting of R\, . . . , R4: 

(a) Go over all configurations x-i = (K-ii'Pi) corresponding to R4. 

(b) Build /C 1 from the union of all components of /Q with expanded bitmaps. 

(c) Build V 1 from the union of Vi- 

(d) If there is a terminal pair (ti,i2) where t\ G Ri 1 and ti G Ri 2 for ii 7^ 

• If there is no bitmap in Xh ( or Xii) containing the cell containing t\ (or ti 
respectively), the configuration is bad. 

• Otherwise, merge the sets corresponding to the appropriate components in V' . 

(e) Build K? by merging components having the same portals, and make appropriate 
changes to V 1 ■ 

(f) Build K? by removing portals not on dR. 

(g) If any component with empty portal set has unsatisfied connectivity requirement 
in V', the current configurations are not consistent. 

(h) Build /C 4 by eliminating components with empty portal set. 

(i) If any bitmap contradicts the locality property, these configurations are not con- 
sistent. 

(j) If the configurations are consistent, update Tr[x] with . 



7. Find the final solution among Tr [x] where R is the bounding box and x has no unsat- 
isfied requirement. 

8. Construct the solution F by recursively following the values from Tr[x]. 



mm 




Figure 2: The algorithm for Euclidean Steiner forest problem. 
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there should be a component corresponding to each of these in Ri and Rj, respectively. Otherwise, 
these configurations do not correspond to any (valid) subsolution. Merge the sets corresponding to 
these components in V: i.e., they have to be connected. Next merge any two components of /C 1 if 
they share a portal, and build K? . Further, make appropriate changes in V' . Build K? by removing 
from K? all portals not on dR. Some of these components reach dR and some do not, namely 
those with an empty portal set P. If there is any component with empty portal set that is not one 
partition set, we deem the configurations %i as inconsistent: in this case, some components that 
are required to be connected together do not reach the boundary. Otherwise, remove all the pairs 
in K? with empty portal set to obtain /C 4 . Now, if there is a cell of R whose four constituent cells 
reach the boundary as more than one connected component, the configurations are not consistent 
either: this contradicts the property of Theorem [H Finally, reduce the dimensions of the bitmaps 
to 7 x 7 such that a cell of the new bitmap acquires value one if and only if there is a one in one 
of the positions corresponding to the constituent cells in the original bitmap. Now, \ = (^■>^ J ) 
is said to be consistent with the four configurations Xi, • • • ,%4 if and only if V contains all the 
requirements of V' , i.e., V' is a refinement of V, and in addition, there exists a k = (P, M) £ K, for 
any k' = {P' , M') £ K! such that PCP' and M = M' . In case these configurations are consistent, 
Tr[x] will take the minimum of its current value and Yli^RiiXi]- You can refer to Figure [2] for a 
summary. 

4.2 Proof of correctness 

Correctness follows from induction on the size of the square R that all dynamic programming states 
have their intended value. In particular, we know that there is a near-optimal solution all whose 
subsolutions are compatible with one configuration. Hence, these will be computed correctly and 
give the final solution. More specifically the following claim holds for all DP states. 

Lemma 9. A dynamic programming state Tr[x] ends up having the minimum value corresponding to 
a solution F of R, such that for any dissection square R' which is a descendant of R in the dissection 
tree, the subsolution FC\R' of R' is compatible with a configuration \' for R' ■ 

Now, we are at the position to prove the main Theorem regarding the Euclidean Steiner forest 
problem. 

Proof of Theorem [Tl By LemmaEJ the proposed dynamic programming is sound and complete. 
There are <3> = 0(poly(n)) DP states. To solve each non-base state, we go over at most $ 4 child 
states and then perform a polynomial consistency check. Each base case state is computed in 
constant time. Hence, the total algorithm runs in time 0(poly(n)). □ 

4.3 Highlights of the new ideas 

Here, we point out the differences between our work and the previous work of [12]. Borradaile et 
al. use closed paths to identify the connected zones of the dissection square. These paths consist of 
vertical and horizontal lines and all the break-points are the corners of the cells. As part of their 
structural property, they prove that they can guarantee a solution in which these zones can be 
identified via paths whose total length is at most a constant rj times the perimeter of the square R. 
Then each path is represented by a chain of {1, 2, 3} of length at most 0(777): the three values are 
used to denote moving one unit forward, or turning to the left or right. This results in a storage of 
3°( T n) w hich is a constant parameter. Instead, we use a bitmap of size 7 x 7 to address this issue. 
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Each zone is represented by a bitmap that has an entry one in the cells of the zone. The bound 

2 

that we obtain, 2 7 , may be slightly worse than the previous work, however, a simpler structural 
property, namely the locality property, suffices as the proof of correctness. Borradaile et al. in 
contrast need a bound on the total length of the zone boundaries, as noted above. 

In addition to the simplification made due to this change, both to the proof and the treatment 
of the dynamic programming, we simplify the proof further. Borradaile et al. charge the additions 
of cr(C, F) to three different structures, and the argument is described and analyzed separately for 
each. We manage to perform a universal treatment and charging all the additions to the simplest 
of the three structures in their work. But this can be done only after showing F* — Q has a limited 
number of components. The proof is simple yet elegant — a weaker claim is proved in [12], but even 
the statement of the claim is hard to read. 

5 Multiplicative prizes 

We first tackle the S -multiplicative prize- collecting Steiner forest problem. Then, we will take a look 
at its asymmetric generalization. Finally, we show how the multiplicative prize- collecting Steiner 
forest problem can be reduced to S-MPCSF. 

5.1 Collecting a fixed prize 

Suppose we are given S, the amount of prize we should collect. Let OPT be the minimum cost 
of a forest F that collects a prize of at least S, and suppose Q C D is the set of terminal pairs 
connected via F. We show how to find a forest with cost at most (1 + e)OPT that collects a prize 
of at least (1 — e')S. By the structural property, we know that there is a solution F' connecting 
the same set of terminal pairs Q whose cost is at most (1 + e)OPT, yet it satisfies the conditions of 
Theorems [5] and [6] Round all the vertex weights down to the next integer multiple of 9 = e'\[Sj c ln. 
In a connected component of F' of total weight A4 that lost a weight a% due to rounding, the lost 
prize is Af — (Ai — a^) 2 < 2ajA, < 2ai\^S, because the total weight of the component is at most 
vS. Thus, F' collects at least S — c lnQ\[~S < (1 — e')S from the rounded weights. 

Each dynamic programming state consists of a dissection square R, a set of components /C, and 
a new parameter II which denotes the total prize collected inside R by connecting the terminal 
pairs. Each element of K, — corresponding to a connected component in the subsolution — now has 
the form k = (P, S) where P denotes the portals of k, and S is the total sum of the weights in 
k. The DP is carried out in a fashion similar to that of [2]. The values of S and II are easy to 
determine for the base cases. It is not difficult to update them, either. Whenever two components 
Ki = (Pi, Si) and k.2 = (^2,^2) merge in the DP, the sum S for the new component is simply 
Si + S2. Besides, the merge increases the II value of the DP state by 2S1S2. 
Proof of Theorem O The soundness and completeness is simple and is along the same lines as 
the proof of Theorem [TJ Carrying out the above operation assumes the values of S and II could 
be stored accurately. However, as they describe the dynamic programming states, their size should 
be sufficiently small or else the algorithm will not run in polynomial time. Here does the rounding 
help us. All values of S are stored as multiples of 6 and the values of n are stored as multiples of 9 2 . 
Notice that as we round the vertex weights at the beginning, throughout the algorithm the values 
of S and n will be multiples of their respective units. Hence, no extra precision error will occur 
and we find the aforementioned solution. If at any time during the execution of the algorithm, 
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the value of E goes above y/S, we truncate it to y/S. Similarly, the value of II is not allowed to 
surpass S. This does not eliminate any solution, because at the point of truncation, the subsolution 
has already gathered sufficient prize. Hence, the range of E is from zero up to y/S, and this gives 
y/S/6 = 2n/e' different values. Similarly for II, there are at most S/0 2 
are at most 



An /e options. There 



$1 



O 



m 



A+l 



2 7 -2n/e' ■ 2 A • 4n 2 /e' 2 = O poly 



1 



n. 



DP states for each square R. The running time is polynomial in $1 and the claim follows. 



□ 



To start the algorithm, we need to guarantee the instance satisfies the conditions at the begin- 
ning of Section [2j See Appendix lA.ll for details of how this is achieved. 



5.2 The asymmetric prizes 

The basic idea is to store two parameters E s and E* for each component of fC. These parameters 
store the total weight of the first and second type in the component, namely Yli&t ana - Si^i' 
respectively. The difficulty is that to collect a prize of A = A s A t in a component, only one of the 
parameters A s or A 1 needs to be large. In particular, we cannot do a rounding with a precision 
like e'y/A/n. It may even happen that A s is large in one component, whereas we have a large A 1 
in another. In fact, we cannot store the values of the E s or E* as multiples of a fixed unit. To 
get around the problem, E s is stored as a pair (v,x), where v is a vertex of the graph and x is an 
integer. Together they show that E s is x ■ e\(j) s {v) / n 2 ; the value of ei will be chosen later, and v is 
supposed to be the vertex of largest type-one weight present in the component. A similar provision 
is made for E . Finally, the value of II is stored as a multiple of €2A/n; we will shortly pick the 
value of €2 ■ 

Whenever Ef = (i>i,xi) and E| = (vi,x%) are added to give E s = (v,x), we do the calculation 
as follows: let v be the vertex v\ or V2 that has the larger <p s value, and then 

xi4> s (vi)/n 2 + X2(f) s (v2)/n 2 

x == 

e\4> s {v)/n 2 

Proof of Theorem |4j The precision error for E s = (v,x) is at most n ■ ei(j) s (v)/n 2 = ei<f) s (v)/n, 
because there is an accumulation of at most n rounding errors each of which has been less than 
e\4> s (v) 1 11 2 . Notice that if E s is stored in terms of the vertex v, it has to include v and thus its 
type one weight is at least <j) s (v). Hence, the precision error is at most a t\/n multiplicative factor. 
Therefore, when we do a multiplication of E S E* to get an addition to n, the error is at most a 
multiplicative 2e\/n: (1 — e\/n)A s (l — ei/njA 1 > (1 — 2e\jn)A s A 1 . Next a rounding error may 
happen to store the value in terms of €2A/n. Each n on the other hand is made up of at most n 
addition terms, so the total error is at most n{2e\/n + €2/ 'n) A. We pick e% = 62 = e'/3 to conclude 
that the total error is bounded by e'A. 

All the discussion applies to E* as well. Due to truncation and rounding, there are at most 
n/e2 options for n. And each E s (or S*) has at most n 2 je\ possibilities. Thus, the total number 
of DP states for each dissection square is $2 = poly(n, 1/e'). Therefore, we obtain a bicriteria 
approximation to the asymmetric variant of the problem. □ 
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5.3 The prize-collecting version: trade-off between penalty and forest cost 

In the prize-collecting variant, we pay for the cost of the forest, and for the prizes not collected. If 
the total weight is A, the prize not collected is A 2 minus the collected prize. One difficulty here is to 
determine the correct range for the collected prize so that we can use the algorithm of Section 15.11 
The trivial range is zero to A 2 . However, the rounding precision we pick for the penalties should 
also take into account the cost of the forest. If the cost of the intended solution is much smaller 
than A 2 , we cannot simply go with rounding errors like eA/n. Otherwise, the error caused due to 
rounding the penalties will be too large compared to the solution value. 

The trick is to find an estimate of the solution value, and then consider two cases depending 
on how the cost compares to the total penalty. Using a 3-approximation algorithm, we obtain a 
solution of value to. We are guaranteed that OPT > uj/3. If A 2 < w/3, the optimum solution is to 
collect no prize at all. Otherwise, assume A 2 > ui/3. To beat the solution of value u, we should 
collect a prize of at least A 2 — uj. 

We first consider the simpler case when w/A 2 > 1/n 2 : For an e' > whose precise value will 
be fixed below, we use the algorithm of Section \5. II to find a bicriteria (1 + e/2, 1 — e')-approximate 
solution for collecting a prize S; this is done for any S which is a multiple of e'A 2 in range [(1 — 
e')A 2 — uj, A 2 ]. We select the best one after adding the uncollected prize to each of these solutions. 
Suppose the optimal solution OPT collects a prize S'. Let OPT f = OPT - (A 2 - S') be the 
length of the forest. Round S' down to the next multiple of e'A 2 , say S. Fed with prize value S, 
the algorithm finds a solution that collects a prize of at least (1 — e')S with forest cost at most 
(l + e/2)OPT/. 

Claim 10. The total cost of this solution is at most (1 + e)OPT if e' — mm ^' 1 ) 



6n 2 



Proof. The total cost of this solution is 



1 + ~) OPT/ + [A 2 - (1 - e')S] < (l + J) OPT/ + [A 2 - (1 - e')(l - e')S' 



2 

< OPT + t^OPT/ + (2e' + e' 2 )S' 

= OPT + |0PT + (2e' + e /2 )^|^OPT 

e A 2 

< OPT + -OPT + (2e' + e' 2 ) ^^OPT 

< OPT + ^OPT + (2e' + e /2 )3n 2 OPT (4) 

< OPT + -OPT + -OPT (5) 

2 2 w 

= (l + e)OPT, 

where flU follows from Qpp < ^yf = 3n 2 , and © uses the definition of e' . □ 

The other case, i.e., u>/A 2 < 1/n 2 , is more challenging. Notice that in order to carry out the 
same procedure in this case, e' may not be bounded by 1/ poly(n) and thus the running time may 
not be polynomial. The solution, however, has to collect almost all the prize. Thus, one of the 
connected components includes almost all the vertex weights. We set aside a subset B of vertices 
of large weight. The vertices of B have to be connected in the solution, or else the paid penalty 
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will be too large. Then, dynamic programming proceeds by ignoring the effect of these vertices 
and only keeping tabs on how many vertices from B exist in each component. At the end, we only 
take into account the solutions that gather all the vertices of B in one component and compute the 
actual cost of those solutions and pick the best one. In the following, we provide the details of our 
method and prove its correctness. 

Let B be the set of all vertices whose weight is larger than noj/A. 

Lemma 11. All the vertices of B are connected in the optimal solution. 

Proof. There are at most n components, so there is a component, say C, whose total weight is 
not less than A/n. We claim all the vertices of B are inside this component. The penalty paid by 
the optimal solution is at most u < A 2 jn. If there is any vertex of B outside C, the penalty of the 
solution is more than A/n ■ nui/A = to, yielding a contradiction. □ 

Next, we round up all the weights to the next multiple of 9 = e'uj/A for vertices not in B. 
Define OPT' as the optimal solution of the resulting instance. Let OPTj be the length of the forest 
in OPT, and define OPT^ similarly. Let OPT^ and OPT^ denote the penalty paid by OPT and 
OPT', respectively. Assume that e' < 1. 

Lemma 12. OPT^ < OPT^ + 12ne'OPT. 

Proof. We recompute the penalties paid by OPT using the rounded weights. The pair (s, t) not 
connected in OPT is either of the two kinds: (1) one of s and t is in B; or (2) none of them is in B. 
The total rounding error for the penalties of the first type is bounded by nA9. There are at most 
n 2 pairs of the second type. Since the weights of these terminals are at most nu>/A, the error is 
not more than n 2 [2(nuj / A)9 + 9 2 ]. Hence, the total error is at most 



n 2 [2(nu/A)9 + 9 2 } + nA9 < n 2 



(e' 2 + 2ne')£ 



+ ne'co 



< n 2 



3ne ^2 



+ ne'uj because e' < 1 



= ( 3 ^ + 1 ) n ^ 

A , u 1 

< 4ne lo because —k < — ^ , 

A z n z 

which is no more than 12ne'OPT as desired. □ 

Suppose we use a dynamic programming approach similar to the previous subsections to find 
the approximately minimum forest length for any specified collected prize amount; in particular, we 
obtain a bicriteria (1 + e/2, 1 — e')-approximate solution. During this process, we ignore the weights 
associated with vertices in B. Consider a DP state \ = H) corresponding to a dissection square 
R. Each component k G /C looks like (P, S, u): the new piece of information, u, is an integer number 
denoting the number of vertices of B inside k. Extending the previous algorithm to populate the 
new DP table is simple. Finally, we look at all the configurations x f° r the bounding box such 
that the fi value of one component is exactly \B\ whereas it is zero for all other components. This 
guarantees that all elements of B are inside the former component and hence we can add up the 
penalties involving those vertices. Let K. = K2, ■ ■ ■ , n q } where Ki = (Pi, Sj), and let k± be the 
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component containing B. The additional cost due to vertices of B is 

fe>>) ■(£*)■ 

\veB / \i=2 / 

Finally, we report the best solution corresponding to these configurations. 

Proof of Theorem [3l Let us first see that the algorithm described runs in polynomial time. It 
is sufficient to bound the number of configurations. The new piece of information has at most n 
possibilities. Further, £ < %-9 is always a multiple of 9. Similarly, II will not exceed ^6 2 and is 
always a multiple of 9 2 . 

We pick e' = 7^ . By Lemmas [11] and [T2l the rounding does not increase the penalties paid by 
the optimal solution by more than e/20PT. We then utilize the algorithm described for 5-MPCSF 
to find a solution of cost at most (l + e/2)OPT / + OPT 7r + e/20PT < (l + e)OPT. Finally, changing 
the weights back to the original values clearly does not increase the cost. □ 



6 Evidence for Hardness 

So far PTASs for geometric problems in Euclidean plane including ours and those of Arora [2] 
and Mitchell [29] can be easily generalized for Euclidean d-dimensional space, for any constant 
d > 2. However we can prove the following theorem on the hardness of the problem for Euclidean 
(i-dimensional space. 

Theorem 13. If notorious densest fc-subgraph is hard to approximate within a factor 0(na.) for some 

constant d, then for any d' > 2d + 1, the fc-forest problem in Euclidean d'-dimensional space is hard to 

1 1 

approximate within a factor 0(n 2d rf'- 1 ). 

Proof. Hajiaghayi and Jain [22J show that if densest /c-subgraph is hard to approximate within 
a factor O(nd), then the k- forest problem on stars is hard to approximate within a factor 0{nM\ 

On the other hand, Gupta [20] shows that a tree metric of size n can be embedded into Euclidean 

1 

(i'-dimensional space with distortion in 0{n d '- x ). Thus for any d' > 2d + 1, we cannot obtain an 

1 1 

approximation factor o(n 2d J'- 1 ) for fc-forest in Euclidean d'-dimensional space, since otherwise by 
solving the problem in Euclidean (i'-dimensional space, finding an Eulerian tour and shortcutting 
it, and finally embedding it back into the star, we can obtain a better approximation than 0(n2d), 
a contradiction. □ 

Note as mentioned above that, despite extensive study, the current best approximation factor 
for notorious densest A;-subgraph is 0{n l ^~ e ) [16] and thus we do not expect to have any PTAS 
for fc-forest in 8-dimensional Euclidean space. 

Unlike the general cases of these problems, as far as PTASs for the case of Euclidean spaces are 
concerned, it seems k- forest and prize- collecting Steiner forest problems are essentially equivalent. 
Indeed in Lemma [T5| we prove that any PTAS for k-forest results in a PTAS for prize- collecting 
Steiner forest, and we believe that any DP algorithm giving a PTAS for PCSF computes along its 
way the optimal solution to different fc-forest instances. 

Thus based on the evidences above, we do believe Euclidean k-forest and Euclidean prize- 
collecting Steiner forest have no PTASs in their general forms. 
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7 Conclusion 



Besides presenting a simpler and correct analysis of the PTAS for the Euclidean Steiner forest 
problem, we showed how the approach can be generalized to solve multiplicative prize-collecting 
problems. 

Generalizing our results to planar graphs, especially obtaining a PTAS for Steiner forest, has 
been a long-standing open problem in this field. The question was settled very recently by Bateni, 
Hajiaghayi and Marx [6|. While Borradaile, Klein and Kenyon-Mathieu [11] gave a PTAS for 
Steiner tree on planar graphs, a main ingredient of their algorithm is solving Steiner tree on graphs 
of bounded-treewidth. However in a sharp contrast, Gassner |18j showed recently that Steiner forest 
is NP-hard even on graphs of treewidth at most 3. Bateni et al. [6j gives a PTAS for the problem on 
graphs of bounded treewidth, and uses it to obtain a PTAS for planar and bounded-genus graphs. 

Last but not least, obtaining any improvement over the approximation factor 2.54 in [22] for 
multiplicative prize-collecting Steiner forest in general graphs seems very interesting. 
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A Deferred proofs and further discussion 

Proof of Lemma 0. The proof of Theorem [5] (although not reproduced here) does not increase 
comp(F n C). Hence, it suffices to prove the result for the forest F specified at the beginning of 
Section [2l Observe that by (II), FnC consists merely of singleton points, because no Steiner point 
lies on a line £ S C. Further notice that the ^i-length of F is at most y/2 length(i ? ). Let F x be 
the total absolute distance F travels in the x direction. Since x-coordinate difference of any two 
consecutive break-points of F is a multiple of 2 and the intersection with vertical lines of C occurs 
at coordinates of the form (2i,y), the total number of intersections with vertical lines is exactly 
F x /2. We can similarly argue for the intersections with horizontal lines, and finally conclude that 
comp(F n C) < & length(F) . □ 

Proof of Lemma [9l This is clearly true for the base cases of the DP since we go over all 
the possibilities. Next, take any configuration \ = (^ "P) corresponding to a non-leaf dissection 
square R, and suppose there is a subsolution F with respect to R compatible with x, such that 
any subsolution F' formed by restricting F to a dissection square R' which is a descendant of R is 
compatible with some configuration x' of R'. Let F{ be the subsolutions restricted to the subsquares 
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Ri. Each of them is thus compatible with an appropriate Xi = (JCii'Pi)- By inductive hypothesis, the 
dynamic programming states Tr. [xi] have been correctly computed. Each connected component 
of F not connected to dR has to have all its terminal pairs satisfied. This is taken care of by 
checking the partition V': the terminals in components that do not advance in the dynamic table 
to Tr[x] have their demands satisfied internally. In addition, the locality property for F ensures 
the configurations will be consistent, and hence we perform an update of from Tr. [xi\- This 

finishes the completeness proof. 

Verifying that the update rule is sound is trivial. If the four configurations Xi update x> then 
there exists a subsolution F formed by the union of the corresponding subsolutions Fi, that is 
compatible with x- d 

A.l The preliminary conditions for multiplicative prizes 

In Section [2j we said that standard perturbation and scaling techniques allow us to assume with a 
cost increase of at most O(eOPT) that the bounding box of the instance has side length at most 
ri 2 e _1 OPT, while restricting all vertices and Steiner points to points of the form (2i + 1, 2j + 1) for 
integers i and j. The claim is based on the following two premises: 

1. If d is the maximum distance of a pair in T>, then OPT > d. 

2. If u and v are farther than n 2 d, they cannot be connected in the optimum solution. 

Using this, the instance can be broken up into disjoint subinstances and then the perturbation can 
be carried out. However, the first premise is false in the case of multiplicative prizes since not all 
the pairs need to be connected. Next we show how similar conditions can be guaranteed in this 
case. 

The value of OPT can be guessed using binary search. To begin the search, we can get crude 
bounds of u/n < OPT < oj, using simple approximation algorithms for the general cases of PCSF 
and fc-forestH Knowing OPT, we build a graph G' on the vertices: there is an edge between u 
and v if and only if their distance is at most OPT. The diameter of each connected component is 
at most nOPT. We consider each of them separately, since two vertices in different components 
cannot be connected in the optimal solution. 

The side length of the bounding box is at most nOPT. Scale the instance by 8e _1 and let 
OPT' = 8e _1 OPT denote the new optimal value. Build a grid in the bounding by lines with 
equations x = 2i and y = 2j for integers Move each vertex and Steiner point to the closest 
point of the form (2i + 1, 2j + 1). Notice that there are at most n Steiner points. Assuming 
OPT > 0, the change in the solution value due to the perturbation is at most 2n ■ 4 = 8n < eOPT'. 
Hence, we can assume that 

• the side length of the bounding box is at most ne^OPT', and 

• the vertices and Steiner points are at coordinates (2i + 1, 2j + 1) for integers 

A.2 k-MST as a special case of S-MPCSF 

Here we show that (even the symmetric) S-MPCSF is a generalization of the rooted £>MST problem 
(for which the best approximation guarantee is 2) . Suppose we are given an instance X of the rooted 

5 The best known approximation algorithms known for these problems are 2.54 and min{\/fc, \/n}, respectively. 
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fc-MST problem. It consists of a graph G(V, E), edge lengths c e , a root vertex r and a number k. 
Suppose r is not to be counted among the k vertices. We build the new instance X 1 of the 5-MPCSF 
problem as follows. The graph G is the same as G. The weights of all vertices are one, except for 
r whose weight is n 2 . Then, the goal will be to find the cheapest forest that gathers a prize of at 
least S = (n 2 + k) 2 = n 4 + 2n 2 k + k 2 . 

Theorem 14. The instance X of the rooted fc-MST problem is equivalent to the instance X 1 of the 
5-MPCSF problem. 

Proof. As we noted in Subsection 11.21 in case of polynomially bounded integer weights, we can 
make sure the returned solution collects a prize of at least S (without any approximation factor). 
This can be achieved by picking e' < 1/S. 

Obviously, any tree connecting k vertices to the root is translated to a forest that collects a 
prize of at least S. Let each vertex not spanned by the tree be a singleton component in the forest. 

Finally, we claim that any solution of value S or higher translates to a solution of value at least 
k for the original instance. The resulting tree is just the component of the forest containing the 
root vertex. Suppose for the sake of reaching a contradiction that the component spans k' < k 
non-root vertices. The total prize collected is at most 

(n 2 + k') 2 + (n-k' - l) 2 < n A + 2n 2 k' + k' 2 + n 2 

= S + 2n 2 (k' - k) + k' 2 + n 2 -k 2 

< S + 2n 2 (k' -k+l) 

<S, 

yielding a contradiction, and proving the supposition is false. □ 
A.3 PCSF vs. A;-forest 

Lemma 15. An a-approximation algorithm for the k-forest problem gives an a(l + e)-approximation 
algorithm for the prize-collecting Steiner forest problem, for any constant e > 0. 

Proof. We show how to approximate a PCSF instance X by invoking several (polynomially many) 
instances X 1 of the fc-forest problem. Obtain an estimate uj for X, such that ^ < OPT < u using a 
general-case 3-approximation algorithm. Let 7Tj be the penalty of the pair i in X. Without loss of 
generality, we can assume that 7Tj < 2uj for any pair i. Let 9 = euj/3n. Place p, = L^fJ copies of 
the pair % in X 1 . Find an a-approximate solution to the resulting fc-forest instance for every value 
of < k < n 1 , where n' is the number of pairs in X' . Compute the PCSF value for each of these 
solutions and report the best one. 

We show that at least one of these candidate solutions is good. Let OPTj and OPT^ be 
the length of the forest and the paid penalty of the optimal solution, respectively. Suppose OPT 
connects a subset of terminal pairs Q. Then, OPT,,- = X^q 71 "*- Focus on the candidate solution 
with k = SieQ L 71 "* / • The length of the corresponding A:-forest instance is at most OPT f , because 
a possible solution is that of connecting the copies of Q. To compute the PCSF value, we add the 
penalty of pairs in Q that are not connected using this tree. We can assume either all or no copies 
of each pair is connected. The number of pairs not connected is at most n' — k, and their penalties 
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sum to no more than 



E ^ E ^ + 

i not connected i not connected 

< E pid+ nd 

i not connected 

< (n' - k)6 + nfl 

< OPT,, + n8 
= OPT,, + eu/3 

< OPTV + eOPT. 

Thus, the PCSF value of the best candidate solution is at most aOPTj + OPTV + eOPT < 
a(l + e)OPT. It remains to show the instances I' have polynomial size. Since it < 2u>, each pair 
i will have pi < 6ne _1 copies. Hence, I' has polynomial size and we can use the approximation 
algorithm for the fe-forest. □ 
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